On the regularization effect of stochastic gradient descent applied to least-squares

نویسندگان

چکیده

We study the behavior of stochastic gradient descent applied to $\|Ax -b \|_2^2 \rightarrow \min$ for invertible $A \in \mathbb{R}^{n \times n}$. show that there is an explicit constant $c_{A}$ depending (mildly) on $A$ such $$ \mathbb{E} ~\left\| Ax_{k+1}-b\right\|^2_{2} \leq \left(1 + \frac{c_{A}}{\|A\|_F^2}\right) \left\|A x_k \right\|^2_{2} - \frac{2}{\|A\|_F^2} \left\|A^T A (x_k x)\right\|^2_{2}.$$ This a curious inequality: last term has one more matrix residual $u_k u$ than remaining terms: if $x_k x$ mainly comprised large singular vectors, leads quick regularization. For symmetric matrices, this inequality extension higher-order Sobolev spaces. explains (known) regularization phenomenon: energy cascade from values small smoothes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Regularization Effects of Anisotropic Noise in Stochastic Gradient Descent

Understanding the generalization of deep learning has raised lots of concerns recently, where the learning algorithms play an important role in generalization performance, such as stochastic gradient descent (SGD). Along this line, we particularly study the anisotropic noise introduced by SGD, and investigate its importance for the generalization in deep neural networks. Through a thorough empi...

متن کامل

Stochastic Optimization Algorithm Applied to Least Median of Squares Regression

The paper presents a stochastic optimization algorithm for computing of least median of squares regression (LMS) introduced by (Rousseeuw and Leroy 1986). As the exact solution is hard to obtain a random approximation is proposed, which is much cheaper in time and easy to program. A MATLAB program is included.

متن کامل

A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)

This work provides a simplified proof of the statistical minimax optimality of (iterate averaged) stochastic gradient descent (SGD), for the special case of least squares. This result is obtained by analyzing SGD as a stochastic process and by sharply characterizing the stationary covariance matrix of this process. The finite rate optimality characterization captures the constant factors and ad...

متن کامل

Stochastic Proximal Gradient Descent for Nuclear Norm Regularization

In this paper, we utilize stochastic optimization to reduce the space complexity of convex composite optimization with a nuclear norm regularizer, where the variable is a matrix of size m × n. By constructing a low-rank estimate of the gradient, we propose an iterative algorithm based on stochastic proximal gradient descent (SPGD), and take the last iterate of SPGD as the final solution. The ma...

متن کامل

Iterate averaging as regularization for stochastic gradient descent

We propose and analyze a variant of the classic Polyak-Ruppert averaging scheme, broadly used in stochastic gradient methods. Rather than a uniform average of the iterates, we consider a weighted average, with weights decaying in a geometric fashion. In the context of linear least squares regression, we show that this averaging scheme has a the same regularizing effect, and indeed is asymptotic...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Electronic Transactions on Numerical Analysis

سال: 2021

ISSN: ['1068-9613', '1097-4067']

DOI: https://doi.org/10.1553/etna_vol54s610